Role of homeostasis in learning sparse 
representations 



Laurent U. Perrinet* 
Institut de Neurosciences Cognitives de la Mediterranee (INCM) 
CNRS / University of Provence 
13402 Marseille Cedex 20, France 

e-mail: 



Laurent . PerrinetOincm . cnrs-mrs . f r 



Abstract 

Neurons in the input layer of primary visual cortex in primates develop 
edge-like receptive fields. One approach to understanding the emergence 
of this response is to state that neural activity has to efficiently represent 
sensory data with respect to the statistics of natural scenes. Furthermore, 
it is believed that such an efficient coding is achieved using a competition 
across neurons so as to generate a sparse representation, that is, where a 
relatively small number of neurons are simultaneously active. Indeed, dif- 
ferent models of sparse coding coupled with Hebbian learning and home- 
ostasis have been proposed that successfully match the observed emer- 
gent response. However, the specific role of homeostasis in learning such 
sparse representations is still largely unknown. By quantitatively assess- 
ing the efficiency of the neural representation during learning, we derive a 
cooperative homeostasis mechanism which optimally tunes the competi- 
tion between neurons within the sparse coding algorithm. We apply this 
homeostasis while learning small patches taken from natural images and 
compare its efficiency with state-of-the-art algorithms. Results show that 
while different sparse coding algorithms give similar coding results, the 
homeostasis provides an optimal balance for the representation of natural 
images within the population of neurons. Competition in sparse coding is 
optimized when it is fair: By contributing to optimize statistical compe- 
tition across neurons, homeostasis is crucial in providing a more efficient 
solution to the emergence of independent components. 
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1 Introduction 



The central nervous system is a dynamical, adaptive organ which constantly 
evolves to provide optimal decisions for interacting with the environment. The 
early visual pathways provide a powerful system for probing and modeling these 
mechanisms. For instance, it is observed that edge-like receptive fields emerge 
in simple cell neurons from the inp ut layer of the primary visual cortex of 
primates (jChapman &: Strvkeii 11992). The development of cortical cell orien- 
tation tuning is an activity-dependent process but it is still largely unknown 
how neural computations implement this type of unsupervised learning mech- 
anisms. A popular view is that such a population of neurons operates so that 
relevant sensory information from the retino-thalamic pathway is transformed 
(or "coded") efficiently. Such efficient represent ation will allow decis i ons t o 
be taken optimally in higher-level layers or areas (|Atickl . Il992t iBarlowl 120011 ). 
It is believed that this is achieved through lateral interactions which remove 
redund ancies in the neural repre sentation, that is, when the representation is 
sparse ( Olshausen fc Fieldl . 1996). A representation is sparse when each input 
signal is associated with a relatively small number of simultaneously activated 
neurons within the population. For instance, orientation selectivity of simple 
cells is sharper that the selectivity that would be predicted by linear filtering. 
As a consequence, representation in the orientation domain is spar se and allow s 
higher processing stages to better segregate edges in the image (jFieldl . Il994h . 
Sparse representations are observed prominently with corti cal response to natu 



ral stimuli, tha t is, to behaviorally relevant se nsory inputs (| Baudot et all 12004 ; 



DeWeese et~aTI . 120031: IVinie fe Gallant! . l2000h . This reflects the fact that, at 



the learning time scale, coding is optimized relative to the statistics of natu- 
ral scenes. The emergence of edge-like simple cell receptive fields in the input 
layer of the primary visual cortex of primates may thus be considered as a cou- 
pled coding and learning optimization problem: At the coding time scale, the 
sparseness of the representation is optimized for any given input while at the 
learning time scale, synaptic weights are tuned to achieve on average optimal 
representation efficiency over natural scenes. 

Most of existing models of unsupervised learning aim at optimizing a cost de- 
fined on prior assumptions on representation's sparseness. These sparse learning 
algorithms have been applied both for images fooi et al.l.l2007tlFyfe fc Baddelev 



1995 1 lOlsha usen fc Fieldl.ll996l:|Perri"nell2004lRehn fe Sommerl.|2007HZibulevskv fe Pearlmutter 



20011 ) and sounds (|Lewicki fe Seinowskil , l2000t ISmith fe Lewickil 120061). For m- 
stance, learning is accomplished in SparseNet (jOlshausen fe Fieldl . Il996l ) on 
patches taken from natural images as a sequence of coding and learning steps. 
First, sparse coding is achieved using a gradient descent over a convex cost 
derived from a sparse prior probability distribution function of the representa- 
tion. At this step of the learning, it is performed using the current state of the 
"dictionary" of receptive fields. Then, knowing this sparse solution, learning is 
defined as slowly changing the dictionary using Hebbian learning. In general, 
the parameterization of the prior has major impacts on results of the sparse cod- 
ing and thus on the emergence of edge-like receptive fields and requires proper 
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tuning. In fact, the definition of the prior corresponds to an objective sparseness 
and does not always fit to the observed probability distribution function of the 
coefficients. In particular, this could be a problem during learning if we use the 
cost to measure representation efficiency for this learning step. An alternative 
is to use a more generic Lo norm sparseness, by simply counting the number 
of non-zero coefficients. It was found that by using an algorithm like Matching 
Pursuit, the learning algorithm could provide results similar to SparseNet , 
but without the need of parametric assumptions on the prior ( Perrinetl 12004 ; 
Perrinet et ail . 120031: iRehn fe Sommerl 120071: ISmith fe Lewickil 120061) . However. 



we observed that this class of algorithms could lead to solutions corresponding 
to a local minimum of the objective function: Some solutions seem as efficient as 
others for representing the signal but do not represent edge-like features homo- 
geneously. In particular, during the early learning phase, some cells may learn 
"faster" than others. There is the need for a homeostasis mechanism that will 
ensure convergence of learning. The goal of this work is to study the specific role 
of homeostasis in learning sparse representations and to propose a homeostasis 
mechanism which optimizes the learning of an efficient neural representation. 

To achieve this, we first formulate analytically the problem of representa- 
tion efficiency in a population of sensory neurons (section [2]) and define the 
class of Sparse Hebbian Learning (SHL) algorithms. For the particular non- 
parametric Lo norm sparseness, we show that sparseness is optimal when av- 
erage activity within the neural population is uniformly balanc ed. Based on a 
previous implementa tion, Adaptive Matching Pursuit (AMP) ( Perrinetl 2004 : 



Perrinet et al. , 2003), we will define a homeostatic gain control mechanism that 



we will integrate in a novel SHL algorithm (section [3]). Finally, we compare in 



section [4 



this novel algorithm wit h AMP and the state-of-the-art SparseNet 



method (jOlshausen fe Field! . I1996I ). Using quantitative measures of efficiency 



based on constraints on the neural representation, we show the importance of 
the homeostasis mechanism in terms of representation efficiency. We conclude 
in section [5] by linking this original method with other Sparse Hebbian Learn- 
ing schemes and how these may be united to improve our understanding of 
the emergence of edge-like simple cell receptive fields, drawing the bridge be- 
tween structure (representation in a distributed network) and function (efficient 
coding) . 



2 Problem Statement 

2.1 Definition of representation efficiency 

In low-level sensory areas, the goal of neural computations is to generate efficient 
intermediate representations to allow efficient decision making. Classically, a 
representation is defined as the inversion of an internal generative model of 
the sensory world, th at is, by inferring the so urces that generated the input 
signal. Formally, as in lOlshausen fe Field (1997). we define a Linear Generative 



Model (LGM) for describing natural, static, grayscale images I (represented by 



3 



column vectors of dimension L pixels), by setting a "dictionary" of M images 
(or "filters") as the L x M matrix $ = {&i}i<i<M- Knowing the associated 
"sources" as a vector of coefficients a = {aj}i<i<M, the image is defined using 
matrix notation as 

I = $a + n (1) 

where n is a decorrelated gaussian additive noise image of variance a\. The 
decorrelation of the noise is achieved by applying Principal Component Analysis 
to the raw input images, without loss of generality since this preprocessing is 
invertible. Generally, the dictionary $ may be much larger than the dimension 
of the input space (that is, if M ^> L) and it is then said to be over-complete. 
However, given an over-complete dictionary, the inversion of the LGM leads 
to a combinatorial search and typically, there may exist many coding solutions 
a to equation [1] for one given input I. The goal of efficient coding is to find, 
given the dictionary $ and for any observed signal I, the "best" representation 
vector, that is, as close as possible to the sources that generated the signal. It is 
therefore necessary to define an efficiency criterion in order to choose between 
these different solutions. 

Using the LGM, we will infer the "best" coding vector as the most probable. 
In particular, from the physical synthesis of natural images, we know a priori 
that image representations are sparse: They are most likely generated by a small 
number of features relatively t o the dimension M of representation space. Sim- 
ilarly to Lewicki fc Seinowskil ( 2000l) . this can be formalized in the probabilistic 



framework defined by the LGM (see equation!]}, by assuming that we know the 
prior distribution of the coefficients a,i for natural images. The representation 
cost of a for one given natural image is then: 

C(a|I,$) = -logP(aM) 

= logZ+-i y ||I-$ a || 2 -^logP(a l |$) (2) 

« i 

where Z is the partition function which is independent of the coding and || • || is 
the L2 norm in image space. This efficiency cost is measured in bits if the loga- 
rithm is of base 2, as we will assume without loss of generality thereafte r. For any 
repre sentation a, the cost value corresponds to the description length (jRissanenl . 



19781) : On the right hand side of equation [2j the second term corresponds to the 
information from the image which is not coded by the representation (recon- 
struction cost) and thus to the information that can be at best encoded using 
entropic coding pixel by pixel (it's the log-likelihood in Bayesian terminology). 
The third term S'(a|<I>) = — J2i l°gP( a i|^) i s the representation or sparseness 
cost: It quantifies representation efficiency as the coding length of each coeffi- 
cient of a independently which would be achieved by entropic coding knowing 
the prior. In practice, the sparseness of coefficients for natural images is often 
defined by an ad hoc parameterization of the prior's shape. For instance, the 



4 



parameterization in 



Olshausen fc Field 



(1997) yields the coding cost: 



t',(a|J. <!.»)= ||I- *a|| 



/3^1og(l + ^) 



(3) 



where f) c orresponds to th e prior's steepness and a to its scaling (see Figure 
13.2 from (jOlshausenl [20021 ) ). This choice is often favored because it results in a 
convex cost for which known numerical optimization methods such as conjugate 
gradient may be used. 

A non-parametric form of sparseness cost may be denned by considering that 
neurons representing the vector a are either active or inactive. In fact, the 
spiking nature of neural information demonstrates that the transition from an 
inactive to an active state is far more significant at the coding time scale than 
smooth changes of the firing rate. This is for instance perfectly il lustrated by the 



binar y nature of the neural code in the auditory cortex of rats (|DeWeese et al 
l2003h . B inary codes also emerge as optima l neura l codes for rapid signal trans- 
mission (jBethge et all 120031: iNikitin et all I200II . With a binary event-based 



code, the cost is only incremented when a new neuron gets active, regardless 
to the analog value. Stating that an active neuron carries a bounded amount 
of information of A bits, an upper bound for the representation cost of neural 
activity on the receiver end is proportional to the count of active neurons, that 
is, to the L norm: 



Co(aM) 



2al 



I -$a|| 



A||a|| 



(4) 



This cost is similar with information criteria such as the AIC (|Akaikd . Il974f ) or 
distortion rate (M allatl . Il998l p. 571). This simple non-parametric cost has the 
advantage of being dynamic: The number of active cells for one given signal 
grows in time with the number of spikes reaching the receiver (see architecture 
of the model in figure QjLeft). g u t equation H] defines a harder cost to optimize 
since the hard Lo norm sparseness leads to a non-convex optimization problem 
which is NP-complete with respect to the dimension M of the dictionary ([Mallatl . 
19981 p. 418). 



2.2 Sparse Hebbian Learning (SHL) 

Given a sparse coding strategy that optimizes any representation efficiency cost 
as defined above, we may derive an unsupervised learning model by optimiz- 
ing the dictionary $ over natural scenes. On the one hand, the flexibility in 
the definition of the sparseness cost leads to a w ide variety of proposed sparse 
codi ng solutions (for a review, see (IPeceL 120021) ') such as numerical optimiza- 
tion ( Lee et al. . 2007; Olshausen & Field! Il997l) . non-negative matrix factoriza- 



2004; 



tion ( Lee fc Scuna 



Perrinet et al 



1999; Ranzat o et al.l. I2007T) or Matching Pursuit (IPerrinet . 



2003ilRehn fc Sommerl . 120071 ISmith fe Lewickl 120061) . O 



the other hand, these methods share the same LGM model (see equation Q} and 
once the sparse coding algorithm is chosen, the learning scheme is similar. 
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Indeed, after every coding sweep, the efficiency of the dictionary $ may be 
increased with respect to equation [2] By using the online gradient descent 
approach given the current sparse solution, learning may be achieved using Vi: 



$i + 77a 4 (I-$a) 



(5) 



where r\ is the le arning rate. Simi l arly to Eq. 17 in flOlshausen fc Field! . Il997h 



or to Eq. 2 in JSmith fc Lewickil . 120061) . the relation is a linear "Hebbian' 



rule dHebbl . |l949) since it enhances the weight of neurons proportionally to 



the correlation between pre- and post-synaptic neurons. Note that there is no 
learning for non-activated coefficients. The nov elty of this formulation compared 
to other linear Hebbian learning rule such as day . Ii982l) is to take advantage 
of the sparse representation, hence the name Sparse Hebbian Learning (SHL). 

SHL algorithms are unstable without homeostasis. In fact, starting with a ran- 
dom dictionar y, the first filter s to learn are more likely to correspond to salient 



features (jPerrinet et all 120041 ) and are therefore more likely to be selected again 
in subsequent learning steps. In SparseNet, the homeostatic gain control is 
implemented by adaptively tuning the norm of the filters. This method equalizes 
the variance of coefficients across neurons using a geometric stochastic learning 
rule. The underlying heuristic is that this introduces a bias in the choice of 
the active coefficients. In fact, if a neuron is not selected often, the geometric 
homeostasis will decrease the norm of the corresponding filter, and therefore 
— from equation [1] and the conjugate gradient optimization — this will increase 
the value of the associated scalar. Finally, since the prior functions defined 
in equation [3] are identical for all neurons, this will increase the relative proba- 
bility that the neuron is selected with a higher relative value. The parameters of 
this homeostatic rule have a great importance for the convergence of the global 
algorithm. We will now try to define a more general homeostasis mechanism 
derived from the optimization of representation efficiency. 



2.3 Efficient cooperative homeostatis in SHL 

The role of homeostasis during learning is to make sure that the distribution of 
neural activit y is homogen eous. In fact, neurons belonging to a same neu- 
ral assembly (|Hebbl . Il949h form a competitive network and should a priori 
carry similar information. T his optimizes the co ding efficiency of neural ac- 
tivity in terms of comp r ession ( van HaterenL 19931 ) and thus minimizes intrinsic 



noise (jSrinivasan et all 119821 ). Such a strategy is similar to introducing an in- 



trinsic adaptation rule such that prior firin g probability of a l l neu rons have 
a similar Laplacian probability distribution ( Weber fc Trieschl 20081 ). Dually, 
since neural activity in the assembly actually represents the sparse coefficients, 
we may understand the role of homeostasis as maximizing the average repre- 
sentation cost C(a|$) at the time scale of learning. This is equivalent to say 
that homeostasis should act such that at any time, invariantly to the selectivity 
of features in the dictionary, the probability of selecting one feature is uniform 
across the dictionary. 
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Figure 1: Simple neural model of sparse coding and role of homeostasis. 

(Left) We define the coding model as an information channel constituted by a 
bundle of Linear/Non-Linear spiking neurons. (L) A given input image patch 
is coded linearly by using the dictionary of filters and transformed by sparse 
coding (such as Matching Pursuit) into a sparse vector a. Each coefficient is 
transformed into a driving coefficient in the (NL) layer by using a point non- 
linearity which (S) drives a generic spiking mechanism. (D) On the receiver end 
(for instance in an efferent neuron), one may then estimate the input from the 
neural representation pattern. This decoding is progressive, and if we assume 
that each spike carries a bounded amount of information, representation cost 
in this model increases proportionally with the number of activated neurons. 
(Right) However, for a given dictionary, the distribution of sparse coefficients 
aij and hence the probability of a neuron's activation is in general not uniform. 
We show (Lower panel) the log-probability distribution function and (Upper 
panel) the cumulative distribution of sparse coefficients for a dictionary of edge- 
like filters with similar selectivity (dotted scatter) except for one filter which 
was randomized (continuous line). This illustrates a typical situation which 
may occur during learning when some components did learn less than others: 
Since their activity will be lower, they are less likely to be activated in the 
spiking mechanism and from the Hebbian rule, they are less likely to learn. 
When selecting an optimal sparse set for a given input, instead of comparing 
sparse coefficients with respect to a threshold (vertical dashed lines), it should 
instead be done on the significance value Zi (horizontal dashed lines): In this 
particular case, the less selective neuron (ai < 02) is selected by the homeostatic 
cooperation (zi > z<i). The role of homeostasis during learning is that, even if 
the dictionary of filters is not homogeneous, the point non-linearity in (NL) 
modifies sparse coding in (L) such that the probability of a neuron's activation 
is uniform across the population. 
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This optimal uniformity may be achieved in all generality for any given dic- 
tionary by using point non-linearities Zi applied to the sparse coefficients: In 
fact, a stan dard method to achieve uniformity is to use an equalization of the 
histogram ([Atickl . [1992) . This method may be easily derived if we know the 



probability distribution function dPi of variable a, by choosing the non-linearity 
as the cumulative distribution function transforming any observed variable cii 
into: 



i(di) = Pi(a,i < Oj) = 



dPi(a,i 



(6) 



This is equivalent to the change of variables which transforms the sparse vector 
a to a variable with uniform probability distribution function in [0, 1] . The 
transformed coefficients may thus be used as a normalized drive to the spik- 
ing mechanism of the individual neurons (see figure QjLeft). This equalization 
process has been observed in the neural activity of a variety of species and is , 
for instance, perfectly illustrated in the salamander's retina (jLaughlin , 1981 ). 
It may evolve dynamically to slowly adapt to varying changes i n luminance or 



contr ast values, such as when the light diminishes at twilight (jHosova et al 
20051) . 



This novel and simple non-parametric homeostatic method is applicable to 
Sparse Hebbian Learning algorithms by using this transform on the sparse co- 
efficients. Let's imagine for instance that one filter corresponds to a feature 
of low selectivity while others correspond to similarly selective features: As a 
consequence, this filter will correspond on average to lower sparse coefficients 
(see figured]- Right). However, the respective gain control function Zi will be such 
that all transformed coefficients have the same probability density function. Us- 
ing the transformed coefficients to evaluate which neuron should be active, the 
homeostasis will therefore optimize the information in the representation cost 
defined in equation |U We will now illustrate how it may be applied to Adaptive 



Matching Pursuit (jPerrinetl 2004; Pc rrinet et all |2003) and measure its role on 



the emergence of edge-like simple cell receptive fields. 



3 Methods 

3.1 Matching Pursuit and Adaptive Matching Pursuit 

Let's first define Adaptive Matching Pursuit. We saw that optimizing the ef- 
ficiency by minimizing the L norm cost leads to a combinatorial search with 
regard to the dimension of the dictionary. In practice, it means that for a given 
dictionary, finding the best sparse vector according to minimizing Co(a|I, $) 
(see equation [4j is ha rd and thus that lear ning an adapted dictionary is diffi- 



cult. As proposed in (jPerrinet et all 120021 ) . we may solve this problem using 



a greedy approach. In general, a greedy approach is applied when finding the 
best combination of elements is difficult to solve globally: A simpler solution is 
to solve the problem progressively, one element at a time. 



8 



Applied to equation 31 it corresponds to first choosing the single element a^i 
that best fits the image. From the definition of the LGM, we know that for 
a given signal I, the probability P({ai}|I, $) corresponding to a single source 
cii^i for any i is maximal for the dictionary element i* with maximal correlation 
coefficient: 

I <J>j 

i* = ArgMax i (p i ), with Pl =< ^— ^— > (7) 

This formulation is slightly different from Eq. 21 in (jOlshausen &: Field , 1997t ). 
It should be noted that pi is the L-dimensional cosine (L is the dimension of the 
input space) and that its absolute value is therefore bounded by 1. The value 
of ArcCos(pi) would therefore give the angle of I with the pattern $j and in 
particular, the angle (modulo 2n) would be equal to zero if and only if p, : = 1 
(full correlation), ir if and only if Pi = —1 (full anti-correlation) and ±7r/2 if 
Pi = (both vectors are orthogonal, there is no correlation). The associated 
coefficient is the scalar projection: 



=<I, 



> 



Second, knowing this choice, the image can be decomposed in 



R 



(8) 



(9) 



where R is the residual image. We then repeat this 2-step process on the residual 
(that is, with I <— R) until some stopping criterion is met. 
Hence, we have a sequential algorithm which permits to recons truct the signal 



using the list of choices and that we called Sparse Spike Coding (|Perrinet et al 
20021 ) . The coding part of the algorithm produces a sparse representation vector 



a for any input image: Its Lo norm is the number of active neurons. Note that 
the norm of the filters have no influence in this algorithm on the choice function 
nor on the cost. For simplicity and without loss of generality, we will there- 
after set the norm of the filters to 1: Vz, II A: 1 1 = 1. It is equivalent to Matching 



Pursuit (MP) algorithm (jMallat fc Zhanall99 3) and we have proven previously 
that this yields an efficient algorithm for representing natural images. Using MP 
in the SHL sche me defined above (see section 12.21) de fines Adaptive Matching 
Pursuit (AMP) dPerrinetl [2004 IPerrinet et all. l2003h and i s simi lar to other 
strategies such as ( Rehn fe Sommer , 2007 ; Smith fe Lewickil . 120061 ) . This class 
of SHL algorithms offers a non-parametric solution to the emergence of simple 
cell receptive fields, but compared to SparseNet, the results often appear to be 
qualitatively non- homogeneous. Moreover, the heuristic used in SparseNet for 
the homeostasis may not be used directly since in MP the choice is independent 
to the norm of the filter. The cod ing algorithm's efficien c y may be improved 
using Optimized Orthog onal MP (Rcbollo-N eira, fc Lowd . 120021 ) and be inte- 
grated in a SHL scheme ( Rehn fc SommerT 2007 ). However, this optimization 
is separate with the problem that we try to tackle here by optimizing the repre- 
sentation at the learning time scale. Thus, we will now study how we may use 
cooperative homeostasis in order to optimize the overall coding efficiency of the 
dictionary learnt by AMP. 
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3.2 Competition-Optimized Matching Pursuit (COMP) 

In fact, we may now include cooperative homeostasis into AMP. At the coding 
level, it is important to note that if we simply equalize the sparse output of the 
MP algorithm, transformed coefficients will indeed be uniformly distributed but 
the sequence of chosen filters will not be changed. However, the MP algorithm is 
non-linear and the choice of an element at one step may influence the rest of the 
choices. This sequence is therefore crucial for the representation efficiency. In 
order to optimize the competition of the choice step, we may instead choose at 
every matching step the item in the dictionary corresponding to the most signifi- 
cant value computed thanks to the cooperative homeostasis (see figure Q} Right). 
In practice, it means that we select the best match in the vector corresponding 
to the transformed coefficients z, that is, in the vector of the residual coefficients 
weighted by the non-linearities defined by equation[B] This scheme thus extends 
the MP algorithm which we used previously by linking it to a statistical model 
which optimally tunes the ArgMax operator in the matching step: Over natural 
images, for any given dictionary — and thus independently to the selectivity of 
the different items from the dictionary — the choice of a neuron is statistically 
equally probable. Thanks to cooperative homeostasis, the efficiency of every 
match in MP is thus maximized, hence the name of Competition-Optimized 
Matching Pursuit (COMP). 

Let's now explicitly describe the COMP coding algorithm step by step. Ini- 
tially, given the signal I, we set up for all i an internal activity vector a as the 
linear correlation using equation El The output sparse vector is set initially to 
a zero vector: a = 0. Using the internal activity a, the neural population will 
evolve dynamically in an event-based manner by repeating the two following 
steps. First, the "Matching" step is defined by choosing the address with the 
most significant activity: 

i* = ArgMaxJz 4 (aO] (10) 

Then, we set the winning sparse coefficient at address i* with aj* <— a$« . In 
the second "Pursuit" step, as in MP, the information is fed-back to correlated 
dictionary elements by: 

CLi 4- di — CLi* < , > (11) 

Note that after the update, the winning internal activity is zero: di* = and 
that, as in MP, a neuron is selected at most once. Physiologically, as previously 
described, the pursuit step could be implemented by a lateral, correlation-based 
inhibition. The algorithm is iterated with equation [10] until some stopping 
criteria is reached, such as when the residual error energy is below the noise 
level o\. As in MP, since the residual is orthogonal to the residual error 
energy E = ||I|| 2 may be easily updated at every step as: 

E*-E-a% (12) 

COMP transforms the image I into the sparse vector a at any precision \f~E~. As 
in MP, the image may be reconstructed using: I = ^ a^i, which thus gives a 
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solution for equation [TJ COMP differs from MP only by the "Matching" step 
and shares many properties with MP, such as the monotonous decrease of the 
error (see equation ll2[) or the exponential convergence of the coding. However, 
the decrease of E will always be faster in MP than in COMP from the constraint 
in the matching step. 

Yet, for a given dictionary, we do not know a priori the functions Zi since 
they depend on the computation of the sparse coefficients. In practice, the Zi 
functions are initialized for all neurons to similar arbitrary cumulative distribu- 
tion functions (COMP is then equivalent to the MP algorithm since choices 
are not affected). Since we have at most one sparse value at per neuron, 
the cumulative histogram function for each neuron for one coding sweep is 
P(a.i < di) = S(a.i < di) where variable di is the observed coefficient to be 
transformed and 8 is the Dirac measure: 6(B) = 1 if the boolean variable B 
is true and otherwise. We evaluate equation [5] after the end of every coding 
using an online stochastic algorithm, Vi, Va^: 

Zi(ffli) <- (1 - rih)zi(di) + ri h S(ai < di) (13) 

where r\h is the homeostatic learning rate. Note that this corresponds to the 
empirical estimation and assumes that coefficients are stationary on a time 
scale of i learning steps. The time scale of homeostasis should therefore in 
general be less than the time scale of learning. Moreover, due to the exponential 
convergence of MP, for any set of components, the Zi functions converge to the 
correct non-linear functions as defined by equation [51 

3.3 Adaptive Sparse Spike Coding (aSSC) 

We may finally apply COMP to Sparse Hebbian Learning (see section [2~2)) . Since 
the efficiency is inspired by the spiking nature of neural representations, we call 
this algorithm adaptive Sparse Spike Coding (aSSC). From the definition of 
COMP, we know that whatever the dictionary, the competition between filters 
will be fair thanks to the cooperative homeostasis. We add no other homeostatic 
regulation. We normalize filters' energy since it is a free parameter in equation[7J 
In summary, the whole learning algorithm is given by the following nested 
loops in pseudo-code: 

1. Initialize the point non-linear gain functions Zi to similar cumulative dis- 
tribution functions and the components $j to random points on the unit 
i-dimensional sphere, 

2. repeat until learning converged: 

(a) draw a signal I from the database, its energy is E = ||I|| 2 , 

(b) set sparse vector a to zero, initialize di =< I, 3>j > for all i, 

(c) while the residual energy E is above a given threshold do: 

i. select the best match: i* — ArgMaxJz^aJ], 
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Figure 2: Comparison of the dictionaries obtained with SparseNet 
and aSSC. We show the results of Sparse Hebbian Learning using two different 
sparse coding algorithms at convergence (20000 learning ste ps): (Left) conjugate 
gradi ent function (CGF) method as used in SparseNet (jOlshausen fc Field! 



19971 ) with (Right) COMP as used in aSSC. Filters of the same size as the 



image patches are presented in a matrix (separated by a black border). Note 
that their position in the matrix is arbitrary as in ICA. 



ii. set the sparse coefficient: ai* — Oj*, 

iii. update residual coefficients: Vz, 5j — a,» < , $i >, 

iv. update energy: E <— E — af». 

(d) when we have the sparse representation vector a, apply Vz: 

i. modify dictionary: 4— $i + — 

ii. normalize dictionary: 4— $j/||$j||, 

iii. update homeostasis functions: Zi(-) <— (1 — r]h)Zi(-) + r}hS(ai < •). 



4 Results on natural images 

The aSSC algorithm differs from the SparseNet algorithm by the MP sparse 
coding algorithm and by the cooperative homeostasis. Using natural images, 
we evaluate the relative contribution of these different mechanisms to the rep- 
resentation efficiency. 



4.1 Receptive field formation 

We first compare the dictionaries of filters obtained by both me thods. We use a 
simil ar context and architecture as the experiments described in (jOlshausen fc Field 



1997|) an d specifically the same database of image patches as the SparseNet 
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algorithm. These images are static, grayscale and whitened according to the 
same parameters to allow a one-to-one comparison of both algorithms. Here, 
we show the results for 16 x 16 image patches (so that L = 256) and the learn- 
ing of M = 324 filters which are replicated as ON and OFF filters. Assuming 
this symmetry in the aSSC algorithm, we use the absolute value of the coeffi- 
cient in equation [TU] and equation [TclR the rest of the algor i thm b eing identical. 



Results replicate the original results of lOlshausen fc Field! ([19971 ) and are com 



parable for both methods: Dictionaries consist of edge-like filters similarly to 
the receptive fields of simple cells in the primary visual cortex (see figure [5]). 
Studying the evolution of receptive fields during learning shows that they first 
represent any salient feature (such as sha rp corners or edges) , because these 



correspond to larger Lipschitz coefficients ([Perrinet et all [2004) . If a receptive 
field contains multiple singularities, only the most salient remains later on dur- 
ing learning: Due to the competition between filters, the algorithm eliminates 
features that are duplicated in the dictionary. Filters which already converged 
to independent components will be selected sparsely and with high associated 
coefficients, but inducing a slower learning since corresponding error is small 
(see equation [5]). We observe for both algorithms that when considering very 
long learning times, the solution is not fixed and edges may slowly drift from one 
orientation to another while global efficiency remains stable. This is due to the 
fact that there are many solutions to the same problem (note, for instance, that 
solutions are invariant up to a permutation of neurons' addresses). It is possible 
to decrease the se degrees of freedom by including for instance topological links 
between filters ( Be dnar et al. . 120041 ). Qualitatively, the main difference between 



both results is that filters produced by aSSC look more diverse and broad (so 
that they often overlap), while the filters produced by SparseNet are more 
localized and thin. 

We also perform robustness experiments to determine the range of learning 
parameters for which these algorithms converged. One advantage of aSSC is 
that it is based on a non-parametric sparse coding and a non-parametric home- 
ostasis rule and is entirely described by 2 structural parameters (L and M) 
and 2 learning parameters (?y and rjh) while parameterization of the prior and 
of the homeostasis for SparseNet requires 5 more parameters to adjust (3 
for the prior, 2 for the homeostasis). By observing at convergence the prob- 
ability distribution function of selected filters, homeostasis in aSSC converges 
for a wide range of ijh values (see equation [T3J) . Furthermore, we observe that 
at convergence, the Zi functions become very similar (see dotted lines in fig- 
ure [IJ-Right) and that homeostasis does not favor the selection of any particular 
neuron as strongly as at the beginning of the learning. Therefore, thanks to the 
homeostasis, equilibrium is reached when the dictionary homogeneously rep- 
resents different features in natural images, that is, when filters have similar 
selectivities. Finally, we observe the counter-intuitive result that non-linearities 
implementing cooperative homeostasis are important for the coding only during 



lr That is, following section [3.31 step 2-c-i becomes i* = ArgMaXj[z;(|ai|)], and step 2-d-iii 
is changed to z 4 (-) <- (1 - rj h )zi{-) + rj h S{\ai\ < •)• 
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the learning period but that it may be ignored for the coding after convergence 
since at this point non-linearities are the same for all neurons. 

Both dictionaries appear to be qualitatively different and for instance param- 
eters of the emerging edges (frequency, length, width) are distributed differ- 
ently. In fact, it seems that rather than the shape of each dictionary element 
taken individually, it is their distribution in image space that yields different 
efficiencies. Such an analysis of the filters' shape dist r ibutio n was performed 
quantitatively for SparseN et in ( Lewicki fc Seinowskl |2000() . The filters were 
fitted by Gabor functions ( Jones fe Palmerl . 1987 ). A recent study compares 
the distribution of fitted Gabor functions' parameters between the model and 
receptive fields obtained from neu rophysiological experime nts conducted in pri- 
mary visual cortex of macaques ([Rehn fc Sommerl . 1200711 . It has shown that 
their SHL model based on Optimized Orthogonal MP better matches to phys- 
iological observations than SparseNet. However, there is no theoretical basis 
for th e fact that receptive fields' shape should be well fitted by Gabor func- 
tions (|Saitol . 120011) and the variety of shapes observed in biological systems may 
for instance reflect ada ptive regulation me chanisms when reaching different op- 



timal sparseness levels ([Assisi et all 120071) . Moreover, even though this type of 



quantitative method is certainly necessary, it is not sufficient to understand the 
role of each individual mechanism in the emergence of edge-like receptive fields. 
To asses the relative role of coding and homeostasis in SHL, we rather compare 
these different dictionaries quantitatively in terms of representation efficiency. 



4.2 Coding efficiency in SHL 

To address this issue, we first compare the quality of both dictionaries (from 
SparseNet and aSSC) by computing the mean efficiency of their respective 
coding algorithms (respectively CGF and COMP). Using 10 5 image patches 
drawn from the natural image database, we perform the progressive coding of 
each image using both sparse coding methods. When plotting the probability 
distribution function of the sparse coeffic ients, one observes that dis tributions 
fit well the bivariate model introduced in ( Olshausen fc Millmanl . l200bT) where a 
sub-set of the coefficients are null (see figure [31-Lcft). Log-probability distribu- 
tions of non-zero coefficients is quadratic with the initial random dictionaries. 
At convergence, non-zero coefficients fit well to a Laplacian probability distri- 
bution function. Measuring mean kurtosis of resulting sparse vectors proves to 
be very sensitive and a poor indicator of global efficiency, in particular at the 
beginning of the coding, when many coefficients are still strictly zero. In gen- 
eral, COMP provides a sparser final distribution. Dually, plotting the decrease 
of the sorted coefficients as a function of their rank shows that coefficients for 
COMP are first higher and then decrease more quickly, due to the link between 
the zi functions and the function of sorted coefficients (see equation [6]) . As a 
consequence, a Laplacian bivariate model for the distribution of sparse coeffi- 
cient emerge from the statistics of natural images. The advantage of aSSC is 
that this emergence is not dependent of a parametric model of the prior. 
In a second analysis, we compare the efficiency of both methods while varying 
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Sparse Coefficient Sparseness (LO-nomn) 



Figure 3: Coding efficiency of SparseNet versus aSSC. We evaluate the 
quality of both learning schemes by comparing coding efficiency of their respec- 
tive coding algorithms, that is CGF and COMP, with the respective dictionary 
that was learnt (see figure [2]). (Left) We show the probability distribution func- 
tion of sparse coefficients obtained by both methods with random dictionaries 
(respectively 'SN-init' and 'aSSC-init') and with the dictionaries obtained after 
convergence of respective learning schemes (respectively 'SN' and 'aSSC'). At 
convergence, sparse coefficients are more sparsely distributed than initially, with 
more kurtotic probability distribution functions for aSSC in both cases. (Right) 
We plot the average residual error (L2 norm) as a function of the relative num- 
ber of active (non-zero) coefficients. This provides a measure of the coding 
efficiency for each dictionary over the set of image patches (error bars are scaled 
to one standard deviation). The Ln norm is equal to the coding step in COMP. 
Best results are those providing a lower error for a given sparsity (better com- 
pression) or a lower sparseness for the same error (Occam's razor). We observe 
similar coding results in aSSC despite its non-parametric definition. This re- 
sult is also true when using the two different dictionaries with the same OOMP 
sparse coding algorithm: The dictionaries still have similar coding efficiencies. 
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the number of active coefficients (the Lo norm). We perform this in COMP 
by simply measuring the residual error (L2 norm) with respect to the coding 
step. To compare this method with the conjugate gradient method, we use a 
2-pass sparse coding: A first pass identifies best neurons for a fixed number of 
active coefficients, while a second pass optimizes the coefficients for this s et of 
"active" vectors. This method was also used in ([Rehn fc Sommeii 120071) and 
proved to be fair when comparing both algorithms. We observe in a robust 
manner that the greedy solution to the hard problem (that is, COMP) is as 
efficient as conjugate gradient as used in SparseNet (see figure [3J Right). We 
also observe that aSSC is also slightly more efficient for the cost defined in equa- 
tion [3J a result which may reflect the fact that the Lo norm defines a stronger 
sparseness constraint than the convex cost. Moreover, we compare the coding 
efficiency of both dictionaries using Optimized Orthogonal MP. Results show 
that OOMP provides a slight coding improvement, but also confirms that both 
dictionaries are of similar coding efficiency, independently of their respective 
coding algorithm. 

These results prove that, without the need of a parameterization of the prior, 
coding in aSSC is as efficiency than SparseNet. In addition, there are a number 
of other advantages offered by this approach. First, COMP simply uses a feed- 
forward pass with lateral interactions, while conjugate gradient is i mplemented 
as th e fixed point of a recurrent network (see Figure 13.2 from (jOlshausen . 
20021) 1. Moreover, we have already seen that aSSC is a non-parametric method 
which is controlled by fewer parameters. Therefore, applying a "higher-level" 
Occam razor confirms that for a similar overall coding efficiency, aSSC is bet- 
ter since it is of lower s tructural complexity . Finally, in SparseNet and 
in algorithms defined i n ( Lewicki fc Seinowski . 2000t Rehn fc Sommer . 2007 ; 



Smith fc Lewickil . 120061 ) , representation is analog without explicitly defining a 



quantization. This is not the case in the aSSC algorithm where cooperative 
homeostasis introduces a regularity in the distribution of sparse coefficients. 



4.3 Role of homeostasis in representation efficiency 

In the context of an information channel such as implemented by a neural as- 
sembly, one should rather use the coefficients that could be decoded from the 
neural signal in order to define the reconstruction cost (see figure [TJ Left). As 
was described in section f2.1[ knowing a dictionary $, it is indeed more correct 
to consider the overall average coding and decoding cost over image patches 
C(a|I, <f>) (see equation [2]) , where a corresponds to the analog vector of coeffi- 
cients inferred from the neural representation. The overall transmission error 
may be described as the sum of the reconstruction and the quantization error. 
This last error will increase both with inter-trial variability but also with the 

2 A quantitative measure of the structural complexity for the different methods is given by 
the minimal length of a code that would implement them, this length being defined as the 
number of characters of the code implementing the algorithm. It would therefore depend on 
the machine on which it is implemented, and there is, of course, a clear advantage of aSSC on 
parallel architectures. 
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Figure 4: Cooperative homeostasis implements efficient quantization. 

(Left) When switching off the cooperative homeostasis during learning, the cor- 
responding Sparse Hebbian Learning algorithm, Adaptive Matching Pursuit 
(AMP), converges to a set of filters that contains some less localized filters 
and some high-frequency Gab or functions that correspond to more "textural" 
features ( Perrinet et all 120031) . One may wonder if these filters are inefficient 
and capturing noise or if they rather correspond to independent features of nat- 
ural images in the LGM model. (Right, Inset) In fact, when plotting residual 
energy as a function of Lo norm sparseness with the MP algorithm (as plotted 
in figure [3l Right), the AMP dictionary gives a slightly worse result than aSSC. 
(Right) Moreover, one should consider representation efficiency as the overall 
coding and decoding algorithm. We compare the efficiency for these dictionar- 
ies thanks to same coding method (SSC) and the same decoding method (using 
rank quantized coefficients) . Representation length for this decoding method is 
proportional to the Lo norm with A = log ^ M ) rj 0.032 bits per coefficient and 
per pixel as defined in equation @] We observe that the dictionary obtained 
by aSSC is more efficient than the one obtained by AMP while the dictionary 
obtained with SparseNet (SN) gives an intermediate result thanks to the ge- 
ometric homeostasis: Introducing cooperative homeostasis globally improves 
neural representation. 
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non homogeneity of the represented features. It is however difficult to evaluate 
a decoding scheme in most sparse coding algorithms since this problem is gener- 
ally not addressed. Our objective when defining Co(a|I,3>) (see equation^} was 
to define sparseness as it may be represented by spiking neural representations. 
Using a decoding algorithm on such a representation will help us to quantify 
overall coding efficiency. 

An effective decoding algorithm is to estimate the analog values of the sparse 
vector (and thus rec onstruct t he sig nal) from the order of neurons' activation 



in the sparse vector (|Perrinetl 120071 Section 2.2). In fact, knowing the address 
of the fiber i corresponding to the maximal value, we may infer that it has 
been produced by an analog value on the emitter side in the highest quan- 
tile of the probability distribution function of a^o. We may therefore decode 
the corresponding value with the best estimate which is given as the average 
maximum sparse coefficient for this neuron by inverting z^o (see equation [BJlfj: 
djo = (1). This is also true for the following coefficients. We write as -p- 
the relative rank of the r th and o the order function which gives the address 
of the winning neuron at rank r. Since z 0( y) = 1 — -p- = z ( r )(a ( r )), we can 
reconstruct the corresponding value as 

a «(r)=^o"(r)( 1 -^) ( 14 ) 

Physiologically, equation [TJ] could be implemented using interneurons which 
would "count" the number of received spikes and by modulating efficiency of 
synapt ic events on receiver effere nt neurons — for instance with shunting inhi- 



bition (jDelorme fc Thorpd . 120031 ) . Recent findings s how that this type o f code 



may be used in cortical in vitro recurrent networks (jShahaf et all [2008) . This 



corresponds to a generalized rank coding scheme. However this quantization 
does not require that neural information explicitly carries rank information. 
In fact, this scheme is rather general and is analogous to scalar quantization 
using the modulation function z~ x as a Look- Up- Table. It is very likely that 
fine temporal information such as inter-spike intervals also play a role in neu- 
ral information transmission. As in other decoding schemes, the quantization 
error directly depends on the variability of the modulation functions across 



trials (jPerrinet et all 120041) . This scheme thus rather shows a representative 
behavior for the retrieval of information from spiking neural activity. 

To evaluate the specific role of cooperative homeostasis, we compare previous 
dictionaries (see figure [2]) with the one obtained by Adaptive Matching Pursuit 
(AMP). In fact, SparseNet and aSSC differ at the level of the homeostasis 
but also for the sparse coding. The only difference between aSSC and AMP is 
the introduction of cooperative homeostasis. To obtain the solution to AMP, we 
use the same sparse coding algorithm but switch off the cooperative homeostasis 
during learning (rj h = in equation [T3|) . We observe at convergence that the 
dictionary corresponds qualitatively to features which are different from aSSC 
and SparseNet (see figured! Left). In particular, we observe the emergence of 



3 Mathematically, the Zi are not always strictly increasing and we state here that z i 1 (z) 
defined in a unique way as the average value of the coefficients ctj such that Zi(a;) = z. 
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Gabor functions with broader width which better match t e xture s. These filters 
correspond to lower Lipschitz coefficients ([Perrinet et al 1 l2004h . and because 
of their lower saliency, these textural filters are more likely to be selected with 
lower correlation coefficients. They fit more to Fourier filters that are obtained 
using Principal Component Analysis ([Fvfe &: Baddelevl.ll995l) a nd are still op- 
timal to code arbitrary image patches such as noise (|Zhaopinel . 120061) . When 
we plot L/2 norm with respect to Lo norm for the different dictionaries with 
the same MP coding algorithm averaged over a set of 10 5 image patches from 
natural scenes (see figure 21 Right Inset), the resulting dictionary from AMP 
is less efficient than those obtained with aSSC and SparseNet. This is not 
an expected behavior since COMP is more constrained than MP (MP is the 
"greediest" solution) and using both methods with a similar dictionary would 
necessarily give an advantage to MP: the AMP thus reached a local minima 
of the coding cost. To understand why, recall that in the aSSC algorithm, the 
cooperative homeostasis constraint, by its definition in equation [51 plays the 
role of a gain control and that the point non-linearity from equation [TO] ensures 
that all filters are selected equally. Compared to AMP, textured elements are 
"boosted" during learning relative to a more generic salient edge component 
and are thus more likely to evolve (see figure [TJ Right). This explains why they 
would end up being less probable and that at convergence there are no textured 
filters in the dictionary obtained with aSSC. 

Finally, we test quantitatively representation efficiency of these different dic- 
tionaries with the same quantization scheme. At the decoding level, we compute 
in all cases the modulation functions as defined in equation [141 on a set of 10 s 
image patches from natural scenes. Since addresses' choices may be generated 
by any of the M neurons, the representation cost is defined as A = log(M) 
bits per chosen address (see equation [3} . Then, when using the quantization 
(see equation [T4"|) , the AMP approach displays a larger variability reflecting 
the lack of homogeneity of the features represented by the dictionary: There 
is a much larger reconstruction error and a slower decrease of error's energy 
(see figure SI Right). The aSSC on the contrary is adapted to quantization 
thanks to the cooperative homeostasis and consequently yields a more regular 
decrease of coefficients as a function of rank, that is, a lower quantization error. 
The dictionary obtained with the SparseNet algorithm yields an intermediate 
result. This shows that the heuristic implementing the homeostasis in this algo- 
rithm regulates relatively well the choices of the elements during the learning. 
It also explains why the three parameters of the homeostasis algorithm had to 
be properly tuned to fit the dynamics of the heuristics. Results therefore show 
that homeostasis optimizes the efficiency of the neural representation during 
learning and that the cooperative homeostasis provides a simple and effective 
optimization scheme. 
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5 Discussion 



We have shown in this paper that homeostasis plays an essential role in Sparse 
Hebbian Learning (SHL) schemes and thus on our understanding of the emer- 
gence of simple cell receptive fields. First, using statistical inference and infor- 
mation theory, we have proposed a quantitative cost for the coding efficiency 
based on a non-parametric model using the number of active neurons, that 
is, the Lo norm of the representation vector. This allowed to desig n a coop- 
erativ e homeostasis rule based on neurophysiological observations (jLaughlinl . 



19811 ). This rule optimizes the competition between neurons by simply con- 



straining the choice of every selection of an active neuron to be equiprobable. 
This homeostasis defined a new sparse coding algorithm, COMP, and a new 
SHL scheme, aSSC. Then, we have confirmed that the aSSC scheme provides 
an efficient model for the formation of simple cell receptive fields, similarly to 
other approaches. The sparse coding algorithms in these schemes are variants 
of conjugate gradient or of Matching Pursuit. They are based on correlation- 
based inhibition since this is necessary to remove redundancies from the linear 
representation. This is consistent with the observation that lat eral interactions 
are n ecessary for the formation of elongated receptive fields (|Bolz fc Gilbertl . 
1989). With a correct tuning of parameters, all schemes show the emergence 



of edge-like filters. The specific coding algorithm used to obtain this sparse- 
ness appears to be of secondary importance as long as it is adapted to the data 
and yields sufficiently efficient sparse representation vectors. However, resulting 
dictionaries vary qualitatively among these schemes and it was unclear which 
algorithm is the most efficient and what was the individual role of the different 
mechanisms that constitute SHL schemes. At the learning level, we have shown 
that the homeostasis mechanism had a great influence on the qualitative dis- 
tribution of learned filters. In particular, using the comparison of coding and 
decoding efficiency of aSSC with and without this specific homeostasis, we have 
proven that cooperative homeostasis optimized overall representation efficiency. 
This efficiency is comparable with that of SparseNet , but with the advantage 
that our unsupervised learning model is non-parametric and does not need to 
be properly tuned. 

This work might be advantageously applied to signal processing problems. 
First, we saw that optimizing the representation cost maximizes the indepen- 
dence between features and is related to the goal of ICA. Since we have built 
a solution to the LGM inverse problem that is more efficient than standard 
methods such as the SparseNet algorithm, it is thus a good candidate so- 
lution to Blind Source Separation problems. Second, at the coding level, we 
optimized in the COMP algorithm the efficiency of Matching Pursuit by in- 
cluding an adaptive cooperative homeostasis mechanism. We proved that for a 
given compression level, image patches are more efficiently coded than in the 
Matching Pursuit algorithm. Since we have shown previously that MP compares 
favorably with compression metho ds such as JPEG with a fixed log-Gabor fil- 
ter dictionary ( Fischer et al. . 20071) . we can predict that COMP should provide 



promising results for image representation. An advantage over other sparse 
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coding schemes is that it provides a progressive dynamical result while the con- 
jugate gradient method has to be recomputed for every different number of 
coefficients. The most relevant information is propagated first and progressive 
reconstruction may be interrupted at any time. Finally, a main advantage of 
this type of neuromorphic algorithm is that it uses a simple set of operations: 
computing the correlation, applying the point non-linearity from a Look-Up 
Table, choosing the ArgMax, doing a subtraction, retrieving a value from a 
Look-Up- Table. In particular, the complexity of these operations, such as the 
ArgMax operator, would in theory not depend on the dimension of the system 
in parallel machines and the transfer of this technology to neurom orphic hard- 
ware such as aVLSIs ( Briiderle et al. . 20091 Schemmel et al. . 2006 ) will provide 
a supra-linear gain of performance. 

In this paper, we focused on transient input signals and of relatively abstract 
neurons. This choice was made to highlight the powerful function of the paral- 
lel and temporal competition between neurons in contrast to traditional analog 
and sequential strategies using analog spike frequency representations. This 
strategy allowed to compare the proposed learning scheme with state-of-the-art 
algorithms. One obvious extension to the algorithm is to implement learning 
with more realistic inputs. In fact, sparseness in image patches is only local 
while it is also spatial and temporal in whole-field natural scenes: For instance, 
it is highly probable in whole natural images that large parts of the space - 
such as the sky — are flat and contain no information. Our results should be 
thus taken as a lower bound for the efficiency of aSSC in natural scenes. This 
also suggests the extension to representations with some built-in invariances, 
such as translation and scaling. A gaussian pyramid, for instance, provides a 
multi-scale representation w here the s et of learned filters would become a dic- 
tionary of mother wavelets ( Perrinet . 20071 Section 3.3.4). Such an extension 
leads to a fundamental question: How docs representation efficiency evolves 
with the number M of elements in the dictionary, that is, with the complexity 
of the representation? In fact, when increasing the over-completeness in aSSC, 
one observes the emergence of different classes of edge filters: at first different 
positions, then different orientations of edges, followed by different frequencies 
and so on and so forth. This specific order indicates the existence of an un- 
derlying hierarchy for the synthesis of natural scenes. This hierarchy seems to 
correspond to the level of importance of the different transformations that are 
learned by the system, respectively translation, rotation and scaling. Exploring 
the efficiency results for different dimensions of the dictionary in aSSC will thus 
give a quantitative evaluation of the optimal complexity of the model needed to 
describe images in terms of a trade-off between accuracy and generality. But it 
may also provide a model for the clustering of the low-level visual system into 
different areas, such as the emergence of position-independent representations in 
the ventral visual pathway versus motion-selective neurons in the dorsal visual 
pathway. 
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